AITopics | street view image

Collaborating Authors

street view image

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

15cc8e4a46565dab0c1a1220884bd503-Paper-Conference.pdf

Neural Information Processing SystemsFeb-8-2026, 10:25:28 GMT

data mining, machine learning, natural language, (20 more...)

Neural Information Processing Systems

Country:

Asia > Singapore (0.05)
North America > United States > Georgia > Fulton County > Atlanta (0.04)
North America > United States > California > Alameda County > Berkeley (0.04)

Genre: Research Report > Experimental Study (0.93)

Industry:

Information Technology (1.00)
Transportation > Infrastructure & Services (0.34)

Technology:

Information Technology > Information Management (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
(3 more...)

Add feedback

Predicting Household Water Consumption Using Satellite and Street View Images in Two Indian Cities

Wang, Qiao, George, Joseph

arXiv.org Artificial IntelligenceNov-3-2025

Monitoring household water use in rapidly urbanizing regions is hampered by costly, time-intensive enumeration methods and surveys. We investigate whether publicly available imagery-satellite tiles, Google Street View (GSV) segmentation-and simple geospatial covariates (nightlight intensity, population density) can be utilized to predict household water consumption in Hubballi-Dharwad, India. We compare four approaches: survey features (benchmark), CNN embeddings (satellite, GSV, combined), and GSV semantic maps with auxiliary data. Under an ordinal classification framework, GSV segmentation plus remote-sensing covariates achieves 0.55 accuracy for water use, approaching survey-based models (0.59 accuracy). Error analysis shows high precision at extremes of the household water consumption distribution, but confusion among middle classes is due to overlapping visual proxies. We also compare and contrast our estimates for household water consumption to that of household subjective income. Our findings demonstrate that open-access imagery, coupled with minimal geospatial data, offers a promising alternative to obtaining reliable household water consumption estimates using surveys in urban analytics.

artificial intelligence, machine learning, water consumption, (16 more...)

arXiv.org Artificial Intelligence

2510.26957

Country:

North America > United States (1.00)
Asia > India (0.86)

Genre: Research Report > New Finding (1.00)

Industry:

Education > Health & Safety > School Nutrition (1.00)
Water & Waste Management > Water Management (0.94)
Energy > Renewable > Geothermal > Geothermal Energy Exploration and Development > Geophysical Analysis & Survey (0.37)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.95)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)

Add feedback

CityRiSE: Reasoning Urban Socio-Economic Status in Vision-Language Models via Reinforcement Learning

Liu, Tianhui, Pang, Hetian, Zhang, Xin, Feng, Jie, Li, Yong, Hui, Pan

arXiv.org Artificial IntelligenceOct-28-2025

Harnessing publicly available, large-scale web data, such as street view and satellite imagery, urban socio-economic sensing is of paramount importance for achieving global sustainable development goals. With the emergence of Large Vision-Language Models (LVLMs), new opportunities have arisen to solve this task by treating it as a multi-modal perception and understanding problem. However, recent studies reveal that LVLMs still struggle with accurate and interpretable socio-economic predictions from visual data. To address these limitations and maximize the potential of LVLMs, we introduce \textbf{CityRiSE}, a novel framework for \textbf{R}eason\textbf{i}ng urban \textbf{S}ocio-\textbf{E}conomic status in LVLMs through pure reinforcement learning (RL). With carefully curated multi-modal data and verifiable reward design, our approach guides the LVLM to focus on semantically meaningful visual cues, enabling structured and goal-oriented reasoning for generalist socio-economic status prediction. Experiments demonstrate that CityRiSE with emergent reasoning process significantly outperforms existing baselines, improving both prediction accuracy and generalization across diverse urban contexts, particularly for prediction on unseen cities and unseen indicators. This work highlights the promise of combining RL and LVLMs for interpretable and generalist urban socio-economic sensing.

large language model, machine learning, reinforcement learning, (19 more...)

arXiv.org Artificial Intelligence

2510.22282

Country: Asia > China (0.68)

Genre: Research Report > New Finding (0.46)

Industry:

Health & Medicine (0.93)
Banking & Finance > Economy (0.49)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.95)
(3 more...)

Add feedback

Inconsistent Affective Reaction: Sentiment of Perception and Opinion in Urban Environments

Huang, Jingfei, Tu, Han

arXiv.org Artificial IntelligenceOct-10-2025

The ascension of social media platforms has transformed our understanding of urban environments, giving rise to nuanced variations in sentiment reaction embedded within human perception and opinion, and challenging existing multidimensional sentiment analysis approaches in urban studies. This study presents novel methodologies for identifying and elucidating sentiment inconsistency, constructing a dataset encompassing 140,750 Baidu and Tencent Street view images to measure perceptions, and 984,024 Weibo social media text posts to measure opinions. A reaction index is developed, integrating object detection and natural language processing techniques to classify sentiment in Beijing Second Ring for 2016 and 2022. Classified sentiment reaction is analysed and visualized using regression analysis, image segmentation, and word frequency based on land-use distribution to discern underlying factors. The perception affective reaction trend map reveals a shift toward more evenly distributed positive sentiment, while the opinion affective reaction trend map shows more extreme changes. Our mismatch map indicates significant disparities between the sentiments of human perception and opinion of urban areas over the years. Changes in sentiment reactions have significant relationships with elements such as dense buildings and pedestrian presence. Our inconsistent maps present perception and opinion sentiments before and after the pandemic and offer potential explanations and directions for environmental management, in formulating strategies for urban renewal.

artificial intelligence, machine learning, natural language, (13 more...)

arXiv.org Artificial Intelligence

doi: 10.52842/conf.caadria.2024.2.395

2510.07359

Country:

North America > United States > Massachusetts (0.28)
Asia > China > Beijing > Beijing (0.26)

Genre: Research Report > New Finding (0.69)

Industry:

Law (0.49)
Health & Medicine (0.30)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.36)

Add feedback

15cc8e4a46565dab0c1a1220884bd503-Paper-Conference.pdf

Neural Information Processing SystemsOct-9-2025, 19:18:48 GMT

configuration, geographic configuration, representation, (16 more...)

Neural Information Processing Systems

Country:

Asia > Singapore (0.06)
North America > United States > New York (0.04)
North America > United States > Georgia > Fulton County > Atlanta (0.04)
North America > United States > California > Alameda County > Berkeley (0.04)

Genre: Research Report > Experimental Study (0.93)

Industry:

Information Technology (1.00)
Transportation > Infrastructure & Services (0.34)

Technology:

Information Technology > Information Management (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
(3 more...)

Add feedback

UrbanLLaVA: A Multi-modal Large Language Model for Urban Intelligence with Spatial Reasoning and Understanding

Feng, Jie, Wang, Shengyuan, Liu, Tianhui, Xi, Yanxin, Li, Yong

arXiv.org Artificial IntelligenceJul-1-2025

Urban research involves a wide range of scenarios and tasks that require the understanding of multi-modal data. Current methods often focus on specific data types and lack a unified framework in urban field for processing them comprehensively. The recent success of multi-modal large language models (MLLMs) presents a promising opportunity to overcome this limitation. In this paper, we introduce $\textit{UrbanLLaVA}$, a multi-modal large language model designed to process these four types of data simultaneously and achieve strong performance across diverse urban tasks compared with general MLLMs. In $\textit{UrbanLLaVA}$, we first curate a diverse urban instruction dataset encompassing both single-modal and cross-modal urban data, spanning from location view to global view of urban environment. Additionally, we propose a multi-stage training framework that decouples spatial reasoning enhancement from domain knowledge learning, thereby improving the compatibility and downstream performance of $\textit{UrbanLLaVA}$ across diverse urban tasks. Finally, we also extend existing benchmark for urban research to assess the performance of MLLMs across a wide range of urban tasks. Experimental results from three cities demonstrate that $\textit{UrbanLLaVA}$ outperforms open-source and proprietary MLLMs in both single-modal tasks and complex cross-modal tasks and shows robust generalization abilities across cities. Source codes and data are openly accessible to the research community via https://github.com/tsinghua-fib-lab/UrbanLLaVA.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2506.23219

Country:

Asia > China > Beijing > Beijing (0.07)
Europe > Finland > Uusimaa > Helsinki (0.04)
Asia > Myanmar > Tanintharyi Region > Dawei (0.04)
Asia > China > Heilongjiang Province > Daqing (0.04)

Genre: Research Report > New Finding (0.67)

Industry:

Transportation > Ground > Road (0.68)
Health & Medicine (0.67)
Transportation > Infrastructure & Services (0.46)
Education > Educational Setting (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.95)

Add feedback

CityLens: Benchmarking Large Language-Vision Models for Urban Socioeconomic Sensing

Liu, Tianhui, Feng, Jie, Pang, Hetian, Zhang, Xin, Ouyang, Tianjian, Zhang, Zhiyuan, Li, Yong

arXiv.org Artificial IntelligenceJun-3-2025

Understanding urban socioeconomic conditions through visual data is a challenging yet essential task for sustainable urban development and policy planning. In this work, we introduce $\textbf{CityLens}$, a comprehensive benchmark designed to evaluate the capabilities of large language-vision models (LLVMs) in predicting socioeconomic indicators from satellite and street view imagery. We construct a multi-modal dataset covering a total of 17 globally distributed cities, spanning 6 key domains: economy, education, crime, transport, health, and environment, reflecting the multifaceted nature of urban life. Based on this dataset, we define 11 prediction tasks and utilize three evaluation paradigms: Direct Metric Prediction, Normalized Metric Estimation, and Feature-Based Regression. We benchmark 17 state-of-the-art LLVMs across these tasks. Our results reveal that while LLVMs demonstrate promising perceptual and reasoning capabilities, they still exhibit limitations in predicting urban socioeconomic indicators. CityLens provides a unified framework for diagnosing these limitations and guiding future efforts in using LLVMs to understand and predict urban socioeconomic patterns. Our codes and datasets are open-sourced via https://github.com/tsinghua-fib-lab/CityLens.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2506.0053

Country:

North America > United States (1.00)
Asia (0.94)

Genre: Research Report > New Finding (0.66)

Industry:

Health & Medicine > Consumer Health (0.68)
Law Enforcement & Public Safety > Crime Prevention & Enforcement (0.47)
Banking & Finance > Real Estate (0.46)
Health & Medicine > Public Health (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
(3 more...)

Add feedback

Unsupervised Urban Land Use Mapping with Street View Contrastive Clustering and a Geographical Prior

Che, Lin, Chen, Yizi, Jin, Tanhua, Raubal, Martin, Schindler, Konrad, Kiefer, Peter

arXiv.org Artificial IntelligenceMay-14-2025

Urban land use classification and mapping are critical for urban planning, resource management, and environmental monitoring. Existing remote sensing techniques often lack precision in complex urban environments due to the absence of ground-level details. Unlike aerial perspectives, street view images provide a ground-level view that captures more human and social activities relevant to land use in complex urban scenes. Existing street view-based methods primarily rely on supervised classification, which is challenged by the scarcity of high-quality labeled data and the difficulty of generalizing across diverse urban landscapes. This study introduces an unsupervised contrastive clustering model for street view images with a built-in geographical prior, to enhance clustering performance. When combined with a simple visual assignment of the clusters, our approach offers a flexible and customizable solution to land use mapping, tailored to the specific needs of urban planners. We experimentally show that our method can generate land use maps from geotagged street view image datasets of two cities. As our methodology relies on the universal spatial coherence of geospatial data ("Tobler's law"), it can be adapted to various settings where street view images are available, to enable scalable, unsupervised land use mapping and updating. The code will be available at https://github.com/lin102/CCGP.

classification, data mining, machine learning, (14 more...)

arXiv.org Artificial Intelligence

2504.17551

Country:

Europe > Switzerland > Zürich > Zürich (0.15)
North America > United States > California > San Francisco County > San Francisco (0.14)

Genre: Research Report (0.82)

Industry:

Law > Real Estate Law (1.00)
Energy > Renewable > Geothermal > Geothermal Energy Exploration and Development > Geophysical Analysis & Survey (0.38)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)
(2 more...)

Add feedback

GAIR: Improving Multimodal Geo-Foundation Model with Geo-Aligned Implicit Representations

Liu, Zeping, Zhang, Fan, Jiao, Junfeng, Lao, Ni, Mai, Gengchen

arXiv.org Artificial IntelligenceMar-20-2025

Advancements in vision and language foundation models have inspired the development of geo-foundation models (GeoFMs), enhancing performance across diverse geospatial tasks. However, many existing GeoFMs primarily focus on overhead remote sensing (RS) data while neglecting other data modalities such as ground-level imagery. A key challenge in multimodal GeoFM development is to explicitly model geospatial relationships across modalities, which enables generalizability across tasks, spatial scales, and temporal contexts. To address these limitations, we propose GAIR, a novel multimodal GeoFM architecture integrating overhead RS data, street view (SV) imagery, and their geolocation metadata. We utilize three factorized neural encoders to project an SV image, its geolocation, and an RS image into the embedding space. The SV image needs to be located within the RS image's spatial footprint but does not need to be at its geographic center. In order to geographically align the SV image and RS image, we propose a novel implicit neural representations (INR) module that learns a continuous RS image representation and looks up the RS embedding at the SV image's geolocation. Next, these geographically aligned SV embedding, RS embedding, and location embedding are trained with contrastive learning objectives from unlabeled data. We evaluate GAIR across 10 geospatial tasks spanning RS image-based, SV image-based, and location embedding-based benchmarks. Experimental results demonstrate that GAIR outperforms state-of-the-art GeoFMs and other strong baselines, highlighting its effectiveness in learning generalizable and transferable geospatial representations.

artificial intelligence, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2503.16683

Country:

Africa > South Sudan (0.04)
North America > United States > Texas > Travis County > Austin (0.04)
North America > United States > Colorado (0.04)
(3 more...)

Genre: Research Report > New Finding (0.34)

Industry:

Information Technology (0.46)
Energy > Renewable > Geothermal > Geothermal Energy Exploration and Development > Geophysical Analysis & Survey (0.40)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Geographic Information Systems (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
(3 more...)

Add feedback

OmniGeo: Towards a Multimodal Large Language Models for Geospatial Artificial Intelligence

Yuan, Long, Mo, Fengran, Huang, Kaiyu, Wang, Wenjie, Zhai, Wangyuxuan, Zhu, Xiaoyu, Li, You, Xu, Jinan, Nie, Jian-Yun

arXiv.org Artificial IntelligenceMar-20-2025

The rapid advancement of multimodal large language models (LLMs) has opened new frontiers in artificial intelligence, enabling the integration of diverse large-scale data types such as text, images, and spatial information. In this paper, we explore the potential of multimodal LLMs (MLLM) for geospatial artificial intelligence (GeoAI), a field that leverages spatial data to address challenges in domains including Geospatial Semantics, Health Geography, Urban Geography, Urban Perception, and Remote Sensing. We propose a MLLM (OmniGeo) tailored to geospatial applications, capable of processing and analyzing heterogeneous data sources, including satellite imagery, geospatial metadata, and textual descriptions. By combining the strengths of natural language understanding and spatial reasoning, our model enhances the ability of instruction following and the accuracy of GeoAI systems. Results demonstrate that our model outperforms task-specific models and existing LLMs on diverse geospatial tasks, effectively addressing the multimodality nature while achieving competitive results on the zero-shot geospatial tasks. Our code will be released after publication.

large language model, machine learning, omnigeo, (21 more...)

arXiv.org Artificial Intelligence

2503.16326

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Europe > Austria > Vienna (0.14)
Asia > China > Beijing > Beijing (0.06)
(20 more...)

Genre: Research Report > New Finding (0.48)

Industry:

Health & Medicine > Therapeutic Area > Neurology (0.69)
Government (0.68)
Energy > Renewable > Geothermal > Geothermal Energy Exploration and Development > Geophysical Analysis & Survey (0.56)
Transportation > Ground > Road (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback